skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Ko, E"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Although Large Language Models (LLMs) succeed in human-guided conversations such as instruction following and question answering, the potential of LLM-guided conversations-where LLMs direct the discourse and steer the conversation's objectives-remains under-explored. In this study, we first characterize LLM-guided conversation into three fundamental components: (i) Goal Navigation; (ii) Context Management; (iii) Empathetic Engagement, and propose GuideLLM as an installation. We then implement an interviewing environment for the evaluation of LLM-guided conversation. Specifically, various topics are involved in this environment for comprehensive interviewing evaluation, resulting in around 1.4k turns of utterances, 184k tokens, and over 200 events mentioned during the interviewing for each chatbot evaluation. We compare GuideLLM with 6 state-of-the-art LLMs such as GPT-4o and Llama-3-70b-Instruct, from the perspective of interviewing quality, and autobiography generation quality. For automatic evaluation, we derive user proxies from multiple autobiographies and employ LLM-as-a-judge to score LLM behaviors. We further conduct a human-involved experiment by employing 45 human participants to chat with GuideLLM and baselines. We then collect human feedback, preferences, and ratings regarding the qualities of conversation and autobiography. Experimental results indicate that GuideLLM significantly outperforms baseline LLMs in automatic evaluation and achieves consistent leading performances in human ratings. 
    more » « less
    Free, publicly-accessible full text available February 10, 2026
  2. Although Large Language Models (LLMs) succeed in human-guided conversations such as instruction following and question answering, the potential of LLM-guided conversations—where LLMs direct the discourse and steer the conversation’s objectives—remains largely untapped. In this study, we provide an exploration of the LLM-guided conversation paradigm. Specifically, we first characterize LLM-guided conversation into three fundamental properties: (i) Goal Navigation; (ii) Context Management; (iii) Empathetic Engagement, and propose GUIDELLM as a general framework for LLM-guided conversation. We then implement an autobiography interviewing environment as one of the demonstrations of GuideLLM, which is a common practice in Reminiscence Therapy. In this environment, various techniques are integrated with GUIDELLM to enhance the autonomy of LLMs, such as Verbalized Interview Protocol (VIP) and Memory Graph Extrapolation (MGE) for goal navigation, and therapy strategies for empathetic engagement. We compare GUIDELLM with baseline LLMs, such as GPT-4-turbo and GPT-4o, from the perspective of interviewing quality, conversation quality, and autobiography generation quality. Experimental results encompassing both LLM-as-a-judge evaluations and human subject experiments involving 45 participants indicate that GUIDELLM significantly outperforms baseline LLMs in the autobiography interviewing task. 
    more » « less
  3. Peer review of grant proposals is critical to the National Science Foundation (NSF) funding process for STEM disciplinary and education research. Despite this, scholars receive little training in effective and constructive review of proposals beyond definitions of review criteria and an overview of strategies to avoid bias and communicate clearly. Senior researchers often find that their reviewing skills improve and develop over time, but variations in reviewer starting points can have a negative impact on the value of reviews for their intended audiences of program officers, who make funding recommendations, and principal investigators, who drive the research or want to improve their proposals. Building on the journal review component of the Engineering Education Research Peer Review Training (EER PERT) project, which is designed to develop EER scholars’ peer review skills through mentored reviewing experiences, this paper describes a program designed to provide professional development for proposal reviewing and provides initial evaluation results. 
    more » « less
  4. Peer review of grant proposals is critical to the National Science Foundation (NSF) funding process for STEM disciplinary and education research. Despite this, scholars receive little training in effective and constructive review of proposals beyond definitions of review criteria and an overview of strategies to avoid bias and communicate clearly. Senior researchers often find that their reviewing skills improve and develop over time, but variations in reviewer starting points can have a negative impact on the value of reviews for their intended audiences of program officers, who make funding recommendations, and principal investigators, who drive the research or want to improve their proposals. Building on the journal review component of the Engineering Education Research Peer Review Training (EER PERT) project, which is designed to develop EER scholars’ peer review skills through mentored reviewing experiences, this paper describes a program designed to provide professional development for proposal reviewing and provides initial evaluation results. 
    more » « less
  5. This research paper study was situated within a peer review mentoring program in which novice reviewers were paired with mentors who are former National Science Foundation (NSF) program directors with experience running discipline-based education research (DBER) panels. Whether it be a manuscript or grant proposal, the outcome of peer review can greatly influence academic careers and the impact of research on a field. Yet the criteria upon which reviewers base their recommendations and the processes they follow as they review are poorly understood. Mentees reviewed three previously submitted proposals to the NSF and drafted pre-panel reviews regarding the proposals’ intellectual merit and broader impacts, strengths, and weaknesses relative to solicitation-specific criteria. After participation in one mock review panel, mentees could then revise their pre-review evaluations based on the panel discussion. Using a lens of transformative learning theory, this study sought to answer the following research questions: 1) What are the tacit criteria used to inform recommendations for grant proposal reviews among scholars new to the review process? 2) To what extent are there changes in these tacit criteria and subsequent recommendations for grant proposal reviews after participation in a mock panel review? Using a single case study approach to explore one mock review panel, we conducted document analyses of six mentees’ reviews completed before and after their participation in the mock review panel. Findings from this study suggest that reviewers primarily focus on the positive broader impacts proposed by a study and the level of detail within a submitted proposal. Although mentees made few changes to their reviews after the mock panel discussion, changes which were present illustrate that reviewers more deeply considered the broader impacts of the proposed studies. These results can inform review panel practices as well as approaches to training to support new reviewers in DBER fields. 
    more » « less
  6. This research paper study was situated within a peer review mentoring program in which novice reviewers were paired with mentors who are former National Science Foundation (NSF) program directors with experience running discipline-based education research (DBER) panels. Whether it be a manuscript or grant proposal, the outcome of peer review can greatly influence academic careers and the impact of research on a field. Yet the criteria upon which reviewers base their recommendations and the processes they follow as they review are poorly understood. Mentees reviewed three previously submitted proposals to the NSF and drafted pre-panel reviews regarding the proposals’ intellectual merit and broader impacts, strengths, and weaknesses relative to solicitation-specific criteria. After participation in one mock review panel, mentees could then revise their pre-review evaluations based on the panel discussion. Using a lens of transformative learning theory, this study sought to answer the following research questions: 1) What are the tacit criteria used to inform recommendations for grant proposal reviews among scholars new to the review process? 2) To what extent are there changes in these tacit criteria and subsequent recommendations for grant proposal reviews after participation in a mock panel review? Using a single case study approach to explore one mock review panel, we conducted document analyses of six mentees’ reviews completed before and after their participation in the mock review panel. Findings from this study suggest that reviewers primarily focus on the positive broader impacts proposed by a study and the level of detail within a submitted proposal. Although mentees made few changes to their reviews after the mock panel discussion, changes which were present illustrate that reviewers more deeply considered the broader impacts of the proposed studies. These results can inform review panel practices as well as approaches to training to support new reviewers in DBER fields. 
    more » « less
  7. This is the first of a series of studies that explore the relationship between disciplinary background and the weighting of various elements of a manuscript in peer reviewers’ determination of publication recommendations. Research questions include: (1) To what extent are tacit criteria for determining quality or value of EER manuscripts influenced by reviewers’ varied disciplinary backgrounds and levels of expertise? and (2) To what extent does mentored peer review professional development influence reviewers’ EER manuscript evaluations? Data were collected from 27 mentors and mentees in a peer review professional development program. Participants reviewed the same two manuscripts, using a form to identify strengths, weaknesses, and recommendations. Responses were coded by two researchers (70% IRR). Our findings suggest that disciplinary background influences reviewers’ evaluation of EER manuscripts. We also found evidence that professional development can improve reviewers’ understanding of EER disciplinary conventions. Deeper understanding of the epistemological basis for manuscript reviews may reveal ways to strengthen professional preparation in engineering education as well as other disciplines. 
    more » « less
  8. This paper describes the Engineering Education Research (EER) Peer Review Training (PERT) project, which is designed to develop EER scholars’ peer review skills through mentored reviewing experiences. Supported by the National Science Foundation, the overall programmatic goals of the PERT project are to establish and evaluate a mentored reviewer program for 1) EER journal manuscripts and 2) EER grant proposals. Concurrently, the project seeks to explore how EER scholars develop schema for evaluating EER scholarship, whether these schema are shared in the community, and how schema influence recommendations made to journal editors during the peer review process. To accomplish these goals, the PERT project leveraged the previously established Journal of Engineering Education (JEE) Mentored Reviewer Program, where two researchers with little reviewing experience are paired with an experienced mentor to complete three manuscript reviews collaboratively. In this paper we report on focus group and exit survey findings from the JEE Mentored Reviewer Program and discuss revisions to the program in response to those findings. 
    more » « less